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Given a blackbox for /, a smooth real scalar function of d real variables, one wants to estimate V/ 
at a given point x = (x\, x%, . . . , Xd) with n bits of precision. On a classical computer this requires 
a minimum of d + 1 blackbox queries, whereas on a quantum computer it requires only one query 
regardless of d. The number of bits of precision to which / must be evaluated matches the classical 
requirement in the limit of large n. 

PACS numbers: 03.67.Lx 
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In the context of many numerical calculations, black- 
box query complexity is a natural measure of algorithmic 
efficiency. For example, in optimization problems, the 
function evaluations are frequently the most time con- 
suming part of the computation, and an efficient opti- 
mization algorithm is therefore one which uses as few 
function evaluations as possible pj. 

Here we investigate the query complexity of numeri- 
cally estimating the gradient of a blackbox function at a 
given point. We find that gradients can be estimated on 
a quantum computer using a single blackbox query. The 
algorithm which achieves this can be viewed as a general- 
ization of the Bernstein- VaziraniJ3algorithm, which has 
been described in other contexts The blackbox 

in this algorithm has always previously been described 
as evaluating a function over the integers rather than ap- 
proximating a continuous function with finite precision. 
Gradient finding is the first known practical variant of 
this algorithm. In the question as to whether the al- 
gorithm could be adapted for any task of practical inter- 
est was presented as an open problem, which this paper 
resolves. 

The blackbox that we consider takes as its input d 
binary strings, each of length n, along with n ancilla 
qubits which should be set to zero. The blackbox writes 
its output into the ancilla bits using addition modulo 2 n ° 
and preserves the input bits. This is a standard technique 
for making any function reversible, which it must be for 
a quantum computer to implement it. 

The purpose of the blackbox is to evaluate some func- 
tion / : K d — * K with n bits of precision on a finite 
domain. It does so in fixed-point notation. That is, the 
inputs and outputs to the function /, which are real num- 
bers within a finite range, are approximated by the inputs 
and outputs to the blackbox, which are integers ranging 
from to 2 n , and to 2™°, respectively, via appropriate 
scaling and offset. 

For numerical gradient estimation to work, in the clas- 
sical or quantum case, / and its first partial deriva- 
tives must be continuous in the vicinity of the point 
(x%, X2, ■ ■ ■ , Xd) at which the gradient is to be evaluated. 
Classically, to estimate V/ at x in d dimensions, one can 
evaluate / at x and at d additional points, each displaced 
from / along one of the d dimensions. 

In practice, it may be desirable in the classical gradient 



estimation algorithm to perform the function evaluations 
displaced by ±2/2 from x along each dimension so that 
the region being sampled is centered at x. In this case 2d 
function evaluations are required instead of d+1. df/dx\ 
will then be given by (f(x 1 +l/2, ...)- f{x x - 1/2, . . .))//, 
and similarly for the other partial derivatives. Inserting 
the Taylor expansion for / about x into this expression 
shows that the quadratic terms will cancel, leaving an er- 
ror of order I 2 and higher. One must choose I sufficiently 
small that these terms are negligible. 

Now we consider the quantum case. It suffices to show 
how to perform a quantum gradient estimation at x = 
0, since the gradient at other points can be obtained 
by trivially redefining /. To estimate the gradient at 
the origin, start with d input registers of n qubits each, 
plus a single output register of n Q qubits, all initialized 
to zero. Perform the Hadamard transform on the input 
registers, write the value 1 into the output register and 
then perform an inverse Fourier transform on it. This 
yields the superposition 
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where N = 2 n and N a = 2™°. In vector notation, 
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Next, use the blackbox to compute / and add it mod- 
ulo N into the output register. The output register is an 
eigenstate of addition modulo N a . The eigenvalue corre- 
sponding to addition of x is e l27TX / N °. Thus by writing 
into the output register via modular addition, we obtain 
a phase proportional to /. This technique is sometimes 
called phase kickback. The resulting state is 
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where N is the d-dimensional vector (JV, N, N, . . .), and 
I is the size of the region over which / is approximately 
linear. I and N are used to convert from the components 
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of S, which are nonnegative integers represented by bit 
strings, to rationals evenly spaced over a small region 
centered at the origin. Similarly, the blackbox output is 
related to the value of / by a — ► a(B \ N ^ /] mod N a . m 
is the size of the interval which bounds the components 
of V/. This ensures proper scaling of the final result into 
a fixed point representation, that is, as an integer from 
to 2". 

For sufficiently small I, 



that bounds for the values of the components are known, 
which is a requirement for any algorithm using fixed- 
point arithmetic. 

In general the number of bits of precision necessary 
to represent a set of values is equal to log 2 (r/<5), where 
r is the range of values, and 5 is the smallest difference 
in values one wishes to distinguish. Thus for classical 
gradient estimation with n bits of precision, one needs to 
evaluate / to 
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Writing out the vector components, and ignoring 
global phase, the input registers are now approximately 
in the state 
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This is a product state: 
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bits of precision. 

An important property of the quantum Fourier trans- 
form is that it can correctly distinguish between expo- 
nentially many discrete planewave states with high prob- 
ability without requiring the phases to be exponentially 
precise 0- It is not hard to show that if each phase 
is accurate to within 9 then the inner product between 
the ideal state and the actual state is at least cos 9, and 
therefore the algorithm will still succeed with probability 
at least cos 2 9. 

As shown earlier, the phase acquired by "kickback" is 
equal to 2 ^f-/, and therefore, for the phase to be accurate 
to within ±9, f must be evaluated to within ±^j^9. 
Thus, recalling that N = 2™, 



Fourier transform each of the registers, obtaining 
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Then simply measure in the computational basis to 
obtain the components of V/ with n bits of precision. 
Because / will in general not be perfectly linear, even 
over a small region, there also will be nonzero amplitude 
to measure other values close to the exact gradient, as 
will be discussed later. 

Normally, the quantum Fourier transform is thought 
of as mapping the discrete planewave states to the com- 
putational basis states: 
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where < k < N. However, negative k is also easily 
dealt with, since 
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Thus negative components of V/ pose no difficulties 
for the quantum gradient estimation algorithm provided 
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As an example, if 9 = 7r/8, then the algorithm will be- 
have exactly as in the idealized case with approximately 
85% probability, and N a will exceed the classically re- 
quired precision by four bits, for a given value /. I also 
differs between the quantum and classical cases, as will 
be discussed later. Thus n differs from the classically 
required precision only by an additive constant which 
depends on 9 and I. Because the classical and quantum 
precision requirements are both proportional to 77, this 
difference becomes negligible in the limit of large 77. 

The only approximation made in the description of the 
quantum gradient estimation algorithm was expanding 
/ to first order. Therefore the lowest order error term 
will be due to the quadratic part of /. The behavior of 
the algorithm in the presence of such a quadratic term 
provides an idea of its robustness. Furthermore, in order 
to minimize the number of bits of precision to which / 
must be evaluated, I should be chosen as large as possible 
subject to the constraint that / be locally linear. The 
analysis of the quadratic term provides a more precise 
description of this constraint. 

The series of quantum Fourier transforms on differ- 
ent registers can be thought of as a single <i-dimensional 
quantum Fourier transform. Including the quadratic 
term, the state which this Fourier transform is acting 
on has amplitudes 
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where H is the Hessian matrix of /. After the Fourier 
transform, the amplitudes should peak around the cor- 
rect value of V/. Here we are interested in the width 
of the peak, which should not be affected by V/, so for 
simplicity it will be set to 0. The Fourier transform will 
yield amplitudes of @ 
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Ignoring global phase and doing a change of variables 
(u = S/N), 
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This integral can be approximated using the method of 
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otherwise 



Hu — k. Thus (again ignoring 
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where C is the region — 1/2 < Ui < 1/2 V i. So ac- 
cording to the stationary phase approximation, the peak 
is simply a region of uniform amplitude, with zero am- 
plitude elsewhere. Geometrically, this region is what is 
obtained by applying the linear transformation to 
the d-dimensional unit hypercube. 

Since we have set Vf = 0, the variance of —4^- will 

be 
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and D is the region of nonzero amplitude. Doing a 
change of variables with A as the Jacobian, 
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where C is again the unit hypercube centered at the 
origin. In components, 
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The expectation values on a hypercube of uniform 
probability 



j^Sij, thus 
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This quadratic dependence on N is just as expected 
since, at the end of the computation, the register that 
we are measuring is intended to contain — . Therefore 

° m ox 

the uncertainty in df jdxi is approximately 
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independent of N . In the classical algorithm which 
uses 2d function evaluations, the cubic term introduces 
an error of a ~ I4-D3 where D3 is the typical magni- 
tude of third partial derivatives of /. If the 2 nd partial 
derivatives of / have a magnitude of approximately D2 
then the typical uncertainty in the quantum case will be 

lD ^J^ - ■ To obtain a given uncertainty a, 
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Recalling Eq. JJ) and (J2J) , the number of bits of pre- 
cision to which / must be evaluated depends logarithmi- 
cally on I. However, in the limit of large n, the number 
of bits will match the classical requirement. 

The level of accuracy of the stationary phase approxi- 
mation can be assessed by comparison to numerical solu- 
tions of example cases. In one dimension, Eq. @ reduces 
to cr 2 = 



2 N- 



—, — where a — §^§j£< Figures [3] and ^ display 
the close agreement between numerical results and the 
analytical solution obtained using stationary phase. 

A two dimensional example provides a nontrivial test 
of the stationary phase method's prediction of the peak 
shape. If the Hessian is such that 
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then, according to the stationary phase approximation, 
the peak should be a square of side length with a 
45° rotation. This is in reasonable agreement with the 
numerical result, as shown in figure 

Because this algorithm requires only one blackbox 
query, one might expect that it could be run recursively 
to efficiently obtain higher derivatives. In this case, an- 
other instance of the same algorithm serves as the black- 
box. However, the algorithm itself differs from the black- 
box in that the blackbox has scalar output which it adds 
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FIG. 1: Comparison between error estimates obtained in 
the stationary phase approximation (solid line) and numerical 
results (points) for the one dimensional case. Here the 2 nd 
derivative remains constant (a = 0.02), and the number of 
bits to which the gradient is being evaluated is varied. 
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FIG. 2: Analytical error estimates (solid line) are again tested 
against numerical results (the points) in the one dimensional 
case, varying the 2 nd derivative instead of the number of bits. 
Here N = 80. 



modulo N a to the existing value in the output register, 
and it does not incur any input-dependent global phase. 
An additive scalar output can be obtained by minor mod- 
ification to this algorithm, but the most straightforward 
techniques for eliminating the global phase require an ad- 
ditional blackbox query, thus necessitating 2™ queries for 
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FIG. 3: On the left, the probability, as numerically cal- 
culated, is shown. The areas of highest probability appear 
darkest. On the right, the region of nonzero probability, as 
calculated in the stationary phase approximation, is shaded 
in black. 

the evaluation of an n th partial derivative, just as in the 
classical case. 

The problem of global phase when recursing quantum 
algorithms as well as the difficulties inherent in recursing 
approximate or probabilistic algorithms are not specific 
to gradient finding but are instead fairly general. 

Efficient gradient estimation may be useful, for ex- 
ample, in some optimization and rootfinding algorithms. 
Furthermore, upon discretization, the problem of mini- 
mizing a functional is converted into the problem of min- 
imizing a function of many variables, which might ben- 
efit from gradient descent techniques. A speedup in the 
minimization of functionals may in turn enable more ef- 
ficient solution of partial differential equations via the 
Euler-Lagrange equation. The analysis of the advantage 
which this technique can provide in quantum numerical 
algorithms remains open for further research. 
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